On Partitional Clustering of Malware
نویسندگان
چکیده
In this paper we fully describe a novel clustering method for malware, from the transformation of data into a manipulable standardised data matrix, finding the number of clusters until the clustering itself including visualisation of the high-dimensional data. Our clustering method deals well with categorical data and clusters the behavioural data of 17,000 websites, acquired with Capture-HPC, in less than 2 minutes.
منابع مشابه
Partitional Clustering of Malware Using K-Means
This paper describes a novel method aiming to cluster datasets containing malware behavioural data. Our method transform the data into an standardised data matrix that can be used in any clustering algorithm, finds the number of clusters in the data set and includes an optional visualization step for high-dimensional data using principal component analysis. Our clustering method deals well with...
متن کاملC ONSTRAINT BASED P ARTITIONAL C LUSTERING – A C OMPREHENSIVE S TUDY AND A NALYSIS Aparna
Data clustering is the concept of forming predefined number of clusters where the data points within each cluster are very similar to each other and the data points between clusters are dissimilar to each other. The concept of clustering is widely used in various domains like bioinformatics, medical data, imaging, marketing study and crime analysis. The popular types of clustering techniques ar...
متن کاملComparison of Agglomerative and Partitional Document Clustering Algorithms
Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters, and in greatly improving the retrieval performance either via cluster-driven dimensionality reduction, term-weighting, or query expansion. This ever-increasing importance of do...
متن کاملSoft Clustering Criterion Functions for Partitional Document Clustering
Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of interand intra-cluster similarity, are very effective in producing hard clustering solutions for document datasets and outperform traditional partitional and agglomerative algorithms. In this paper we study the extent to which these criterion funct...
متن کاملUncertain Centroid based Partitional Clustering of Uncertain Data
Clustering uncertain data has emerged as a challenging task in uncertain data management and mining. Thanks to a computational complexity advantage over other clustering paradigms, partitional clustering has been particularly studied and a number of algorithms have been developed. While existing proposals differ mainly in the notions of cluster centroid and clustering objective function, little...
متن کامل